Food. Everyone owns it and loves it. Even more people discuss it. We could talk for hours and hours about food. India is appropriately known as the "Land of Spices." India produces the widest variety of spices of any nation in the world. Due to the advent of numerous domestic and foreign businesses, the restaurant industry in India has seen an extraordinary transformation. A great need for qualified experts in the sector and other linked industries has resulted from this. In order to attract more clients and provide them with better service, Indian restaurants have now gone online thanks to the technological revolution.
The demand and supply graph, however, is not quite how it should be. The restaurant business offers a wide range of opportunities due to a visible shortage of competent workers. Here come the culinary arts schools. In order to meet business expectations, conventional cookery schools and hotel management colleges have now broadened the scope of their educational offerings. Universities in India are spending time and money to train students and prepare them for the workforce.
It's not unexpected that the market for the food services industry has changed as a result of the increased frequency of eating out. The Indian food service industry has advanced significantly since the early 1990s, when it was dominated by small, unorganised businesses.
The revolution started in 1996 when companies including McDonald's, Pizza Hut, Dominos Pizza, Subway, and Yo!China opened stores there. The market for food services has been expanding ever since. The good news is that the food services industry is expected to continue growing for many years to come thanks to factors like rising disposable incomes, a growing young population, an increase in consumers in smaller towns, increased exposure to different cultures and cuisines, and a rising propensity to eat out. The analysis will primarily assist new restaurants in looking at the elements affecting the location of their business.
The primary goal of the Zomato dataset analysis is to gain a clear understanding of the variables influencing the overall rating of each restaurant. Different types of restaurants have been established in various locations, with Bengaluru having more than 50,000 restaurants that serve food from all over the world. Since there are new restaurants launching every day, the market is still young and there is still a growing need. However, despite rising demand, it has becoming more challenging for new restaurants to compete with existing ones. The majority of them serve similar cuisine. India's IT capital is Bengaluru. Since most people don't have the time to prepare their own meals, the majority of the locals rely primarily on restaurant fare. It is now crucial to research a location's demography because of the strong demand for eateries. What kind of food is most widely consumed there. The entire community enjoys eating vegetarian food. If so, does that area mostly consist of members of one particular religious group, such as Jain, Marwari, or Gujarati vegetarians?
We have always been intrigued by Bengaluru's culinary scene. Bengaluru is home to restaurants from all over the world. You can find all types of cuisines in this place, from the United States to Japan, Russia to Antarctica. You name it, Bengaluru has it. The best city for foodies is Bengaluru. Restaurants are becoming more numerous every day. Whose number currently stands at 12,000 eateries and has so many dining establishments. This market has not yet reached saturation. Additionally, new eateries are appearing every day. They now find it challenging to compete with restaurants that have already achieved success. The main problems that they continue to face includes high real estate prices, rising food prices, a lack of qualified workers, a disjointed supply chain, and overlicensing.
This research intends to analyse the area's demography and culinary culture. The most significant benefit is that it will assist new restaurants in selecting their theme, menus, cuisine, price, etc. for a certain area. Additionally, it looks for culinary similarities among Bengaluru neighbourhoods. People will be able to select a restaurant based on the analysis and a number of other criteria.
The project's main goal is to attempt a response to the question based on the interest of restaurants and foodies. And what considerations ought to be made if a new restaurant is to be opened.
The dataset contains 17 variables all of which were scraped from the zomato website. The dataset contains details of more than 50,000 restaurants in Bengaluru, in each of its neighborhood. The data is correct to the best of my knowledge, to that available on the zomato website until 15 March 2019.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly
import plotly.express as px
import plotly.graph_objs as go
from geopy.geocoders import Nominatim
import folium
from folium.plugins import HeatMap
from folium.plugins import FastMarkerCluster
from plotly import tools
import re
from plotly.offline import init_notebook_mode, plot, iplot
from wordcloud import WordCloud, STOPWORDS
from warnings import filterwarnings
filterwarnings('ignore')
df = pd.read_csv("zomato.csv")
df.head()
| url | address | name | online_order | book_table | rate | votes | phone | location | rest_type | dish_liked | cuisines | approx_cost(for two people) | reviews_list | menu_item | listed_in(type) | listed_in(city) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | https://www.zomato.com/bangalore/jalsa-banasha... | 942, 21st Main Road, 2nd Stage, Banashankari, ... | Jalsa | Yes | Yes | 4.1/5 | 775 | 080 42297555\r\n+91 9743772233 | Banashankari | Casual Dining | Pasta, Lunch Buffet, Masala Papad, Paneer Laja... | North Indian, Mughlai, Chinese | 800 | [('Rated 4.0', 'RATED\n A beautiful place to ... | [] | Buffet | Banashankari |
| 1 | https://www.zomato.com/bangalore/spice-elephan... | 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... | Spice Elephant | Yes | No | 4.1/5 | 787 | 080 41714161 | Banashankari | Casual Dining | Momos, Lunch Buffet, Chocolate Nirvana, Thai G... | Chinese, North Indian, Thai | 800 | [('Rated 4.0', 'RATED\n Had been here for din... | [] | Buffet | Banashankari |
| 2 | https://www.zomato.com/SanchurroBangalore?cont... | 1112, Next to KIMS Medical College, 17th Cross... | San Churro Cafe | Yes | No | 3.8/5 | 918 | +91 9663487993 | Banashankari | Cafe, Casual Dining | Churros, Cannelloni, Minestrone Soup, Hot Choc... | Cafe, Mexican, Italian | 800 | [('Rated 3.0', "RATED\n Ambience is not that ... | [] | Buffet | Banashankari |
| 3 | https://www.zomato.com/bangalore/addhuri-udupi... | 1st Floor, Annakuteera, 3rd Stage, Banashankar... | Addhuri Udupi Bhojana | No | No | 3.7/5 | 88 | +91 9620009302 | Banashankari | Quick Bites | Masala Dosa | South Indian, North Indian | 300 | [('Rated 4.0', "RATED\n Great food and proper... | [] | Buffet | Banashankari |
| 4 | https://www.zomato.com/bangalore/grand-village... | 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... | Grand Village | No | No | 3.8/5 | 166 | +91 8026612447\r\n+91 9901210005 | Basavanagudi | Casual Dining | Panipuri, Gol Gappe | North Indian, Rajasthani | 600 | [('Rated 4.0', 'RATED\n Very good restaurant ... | [] | Buffet | Banashankari |
df.shape
(51717, 17)
df.columns
Index(['url', 'address', 'name', 'online_order', 'book_table', 'rate', 'votes',
'phone', 'location', 'rest_type', 'dish_liked', 'cuisines',
'approx_cost(for two people)', 'reviews_list', 'menu_item',
'listed_in(type)', 'listed_in(city)'],
dtype='object')
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 51717 entries, 0 to 51716 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 url 51717 non-null object 1 address 51717 non-null object 2 name 51717 non-null object 3 online_order 51717 non-null object 4 book_table 51717 non-null object 5 rate 43942 non-null object 6 votes 51717 non-null int64 7 phone 50509 non-null object 8 location 51696 non-null object 9 rest_type 51490 non-null object 10 dish_liked 23639 non-null object 11 cuisines 51672 non-null object 12 approx_cost(for two people) 51371 non-null object 13 reviews_list 51717 non-null object 14 menu_item 51717 non-null object 15 listed_in(type) 51717 non-null object 16 listed_in(city) 51717 non-null object dtypes: int64(1), object(16) memory usage: 6.7+ MB
df.isnull().sum()
url 0 address 0 name 0 online_order 0 book_table 0 rate 7775 votes 0 phone 1208 location 21 rest_type 227 dish_liked 28078 cuisines 45 approx_cost(for two people) 346 reviews_list 0 menu_item 0 listed_in(type) 0 listed_in(city) 0 dtype: int64
df.duplicated().sum()
0
df.online_order.replace(('Yes','No'),(True, False), inplace = True)
df.book_table.replace(('Yes','No'),(True, False), inplace = True)
df['rate'].unique()
array(['4.1/5', '3.8/5', '3.7/5', '3.6/5', '4.6/5', '4.0/5', '4.2/5',
'3.9/5', '3.1/5', '3.0/5', '3.2/5', '3.3/5', '2.8/5', '4.4/5',
'4.3/5', 'NEW', '2.9/5', '3.5/5', nan, '2.6/5', '3.8 /5', '3.4/5',
'4.5/5', '2.5/5', '2.7/5', '4.7/5', '2.4/5', '2.2/5', '2.3/5',
'3.4 /5', '-', '3.6 /5', '4.8/5', '3.9 /5', '4.2 /5', '4.0 /5',
'4.1 /5', '3.7 /5', '3.1 /5', '2.9 /5', '3.3 /5', '2.8 /5',
'3.5 /5', '2.7 /5', '2.5 /5', '3.2 /5', '2.6 /5', '4.5 /5',
'4.3 /5', '4.4 /5', '4.9/5', '2.1/5', '2.0/5', '1.8/5', '4.6 /5',
'4.9 /5', '3.0 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5',
'2.1 /5', '2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)
df['rate'].replace(np.nan, '', regex=True, inplace=True)
df['rate'].replace('-', '', regex=True, inplace=True)
df['rate'].unique()
array(['4.1/5', '3.8/5', '3.7/5', '3.6/5', '4.6/5', '4.0/5', '4.2/5',
'3.9/5', '3.1/5', '3.0/5', '3.2/5', '3.3/5', '2.8/5', '4.4/5',
'4.3/5', 'NEW', '2.9/5', '3.5/5', '', '2.6/5', '3.8 /5', '3.4/5',
'4.5/5', '2.5/5', '2.7/5', '4.7/5', '2.4/5', '2.2/5', '2.3/5',
'3.4 /5', '3.6 /5', '4.8/5', '3.9 /5', '4.2 /5', '4.0 /5',
'4.1 /5', '3.7 /5', '3.1 /5', '2.9 /5', '3.3 /5', '2.8 /5',
'3.5 /5', '2.7 /5', '2.5 /5', '3.2 /5', '2.6 /5', '4.5 /5',
'4.3 /5', '4.4 /5', '4.9/5', '2.1/5', '2.0/5', '1.8/5', '4.6 /5',
'4.9 /5', '3.0 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5',
'2.1 /5', '2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)
df = df.loc[df.rate !='NEW']
df = df.loc[df.rate !=''].reset_index(drop=True)
remove_slash = lambda x: x.replace('/5', '') if type(x) == np.str else x
df.rate = df.rate.apply(remove_slash).str.strip().astype('float')
df['rate'].head
<bound method NDFrame.head of 0 4.1
1 4.1
2 3.8
3 3.7
4 3.8
...
41660 3.7
41661 2.5
41662 3.6
41663 4.3
41664 3.4
Name: rate, Length: 41665, dtype: float64>
df[~df['phone'].str.contains('[0-9+]', na = False)]['phone'].unique()
array([nan], dtype=object)
df['phone'].replace(np.nan, '', regex = True, inplace = True)
df["location"].dropna(inplace = True)
df['rest_type'].replace(np.nan, '', regex=True, inplace=True)
df['dish_liked'].replace(np.nan, '', regex=True, inplace=True)
df['cuisines'].replace(np.nan, '', regex=True, inplace=True)
df = df.rename(columns = {'approx_cost(for two people)':'cost_for_2', 'listed_in(type)':'listed_type', 'listed_in(city)':'city'})
df['cost_for_2'] = df['cost_for_2'].astype(str)
df['cost_for_2'] = df['cost_for_2'].apply(lambda x: x.replace(',','.'))
df['cost_for_2'] = df['cost_for_2'].astype(float)
df["cost_for_2"].dropna()
0 800.0
1 800.0
2 800.0
3 300.0
4 600.0
...
41660 800.0
41661 800.0
41662 1.5
41663 2.5
41664 1.5
Name: cost_for_2, Length: 41418, dtype: float64
df["cost_for_2"].isnull().sum()
247
df.dropna(subset=["cost_for_2"], inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 41418 entries, 0 to 41664 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 url 41418 non-null object 1 address 41418 non-null object 2 name 41418 non-null object 3 online_order 41418 non-null bool 4 book_table 41418 non-null bool 5 rate 41418 non-null float64 6 votes 41418 non-null int64 7 phone 41418 non-null object 8 location 41418 non-null object 9 rest_type 41418 non-null object 10 dish_liked 41418 non-null object 11 cuisines 41418 non-null object 12 cost_for_2 41418 non-null float64 13 reviews_list 41418 non-null object 14 menu_item 41418 non-null object 15 listed_type 41418 non-null object 16 city 41418 non-null object dtypes: bool(2), float64(2), int64(1), object(12) memory usage: 5.1+ MB
df.head()
| url | address | name | online_order | book_table | rate | votes | phone | location | rest_type | dish_liked | cuisines | cost_for_2 | reviews_list | menu_item | listed_type | city | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | https://www.zomato.com/bangalore/jalsa-banasha... | 942, 21st Main Road, 2nd Stage, Banashankari, ... | Jalsa | True | True | 4.1 | 775 | 080 42297555\r\n+91 9743772233 | Banashankari | Casual Dining | Pasta, Lunch Buffet, Masala Papad, Paneer Laja... | North Indian, Mughlai, Chinese | 800.0 | [('Rated 4.0', 'RATED\n A beautiful place to ... | [] | Buffet | Banashankari |
| 1 | https://www.zomato.com/bangalore/spice-elephan... | 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... | Spice Elephant | True | False | 4.1 | 787 | 080 41714161 | Banashankari | Casual Dining | Momos, Lunch Buffet, Chocolate Nirvana, Thai G... | Chinese, North Indian, Thai | 800.0 | [('Rated 4.0', 'RATED\n Had been here for din... | [] | Buffet | Banashankari |
| 2 | https://www.zomato.com/SanchurroBangalore?cont... | 1112, Next to KIMS Medical College, 17th Cross... | San Churro Cafe | True | False | 3.8 | 918 | +91 9663487993 | Banashankari | Cafe, Casual Dining | Churros, Cannelloni, Minestrone Soup, Hot Choc... | Cafe, Mexican, Italian | 800.0 | [('Rated 3.0', "RATED\n Ambience is not that ... | [] | Buffet | Banashankari |
| 3 | https://www.zomato.com/bangalore/addhuri-udupi... | 1st Floor, Annakuteera, 3rd Stage, Banashankar... | Addhuri Udupi Bhojana | False | False | 3.7 | 88 | +91 9620009302 | Banashankari | Quick Bites | Masala Dosa | South Indian, North Indian | 300.0 | [('Rated 4.0', "RATED\n Great food and proper... | [] | Buffet | Banashankari |
| 4 | https://www.zomato.com/bangalore/grand-village... | 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... | Grand Village | False | False | 3.8 | 166 | +91 8026612447\r\n+91 9901210005 | Basavanagudi | Casual Dining | Panipuri, Gol Gappe | North Indian, Rajasthani | 600.0 | [('Rated 4.0', 'RATED\n Very good restaurant ... | [] | Buffet | Banashankari |
len(df['location'].unique())
92
locations = pd.DataFrame({"Name":df['location'].unique()})
geolocator = Nominatim(user_agent = "app")
lat = []
lon = []
for location in locations['Name']:
location = geolocator.geocode(location)
if location is None:
lat.append(np.nan)
lon.append(np.nan)
else:
lat.append(location.latitude)
lon.append(location.longitude)
locations['lat'] = lat
locations['lon'] = lon
locations.head()
| Name | lat | lon | |
|---|---|---|---|
| 0 | Banashankari | 15.887678 | 75.704678 |
| 1 | Basavanagudi | 12.941726 | 77.575502 |
| 2 | Mysore Road | 12.946662 | 77.530090 |
| 3 | Jayanagar | 27.643927 | 83.052805 |
| 4 | Kumaraswamy Layout | 12.908149 | 77.555318 |
R_locations = pd.DataFrame(df['location'].value_counts().reset_index())
R_locations.columns=['Name','count']
R_locations.head()
| Name | count | |
|---|---|---|
| 0 | BTM | 3906 |
| 1 | Koramangala 5th Block | 2297 |
| 2 | HSR | 2004 |
| 3 | Indiranagar | 1803 |
| 4 | JP Nagar | 1717 |
print(locations.shape)
print(R_locations.shape)
(92, 3) (92, 2)
Restaurant_locations = R_locations.merge(locations, on = 'Name', how = "left").dropna()
Restaurant_locations.head()
| Name | count | lat | lon | |
|---|---|---|---|---|
| 0 | BTM | 3906 | 45.954851 | -112.496595 |
| 1 | Koramangala 5th Block | 2297 | 12.934843 | 77.618977 |
| 2 | HSR | 2004 | 18.147500 | 41.538889 |
| 3 | Indiranagar | 1803 | 12.973291 | 77.640467 |
| 4 | JP Nagar | 1717 | 12.265594 | 76.646540 |
Restaurant_locations['count'].max()
3906
def generateBaseMap(default_location = [12.97, 77.59], default_zoom_start=12):
base_map = folium.Map(location = default_location, zoom_start = default_zoom_start)
return base_map
basemap = generateBaseMap()
basemap
Restaurant_locations[['lat','lon','count']]
| lat | lon | count | |
|---|---|---|---|
| 0 | 45.954851 | -112.496595 | 3906 |
| 1 | 12.934843 | 77.618977 | 2297 |
| 2 | 18.147500 | 41.538889 | 2004 |
| 3 | 12.973291 | 77.640467 | 1803 |
| 4 | 12.265594 | 76.646540 | 1717 |
| ... | ... | ... | ... |
| 87 | 13.100698 | 77.596345 | 4 |
| 88 | 12.984852 | 77.540063 | 3 |
| 89 | 12.927441 | 77.515522 | 2 |
| 90 | 13.001970 | 77.528839 | 1 |
| 91 | 13.032942 | 77.527325 | 1 |
91 rows × 3 columns
HeatMap(Restaurant_locations[['lat','lon','count']],zoom = 20,radius = 15).add_to(basemap)
basemap
FastMarkerCluster(data=Restaurant_locations[['lat','lon','count']].values.tolist()).add_to(basemap)
basemap
df.rate.replace('NEW', 0, inplace = True)
df.rate.replace('', 0, inplace = True)
df['rate'] = pd.to_numeric(df['rate'])
df.groupby(['location'])['rate'].mean().sort_values(ascending = False)
location
Lavelle Road 4.141788
Koramangala 3rd Block 4.020419
St. Marks Road 4.017201
Koramangala 5th Block 4.006661
Church Street 3.992125
...
Rammurthy Nagar 3.346154
North Bangalore 3.340000
Peenya 3.200000
Bommanahalli 3.190972
Old Madras Road 3.181818
Name: rate, Length: 92, dtype: float64
df.groupby(['location'])['rate'].mean()
location
BTM 3.571659
Banashankari 3.649866
Banaswadi 3.492161
Bannerghatta Road 3.506260
Basavanagudi 3.671092
...
West Bangalore 3.366667
Whitefield 3.623209
Wilson Garden 3.536364
Yelahanka 3.700000
Yeshwantpur 3.502679
Name: rate, Length: 92, dtype: float64
avg_rating = df.groupby(['location'])['rate'].mean().values
loc = df.groupby(['location'])['rate'].mean().index
geolocator = Nominatim(user_agent = "app")
lat=[]
lon=[]
for location in loc:
location = geolocator.geocode(location)
if location is None:
lat.append(np.nan)
lon.append(np.nan)
else:
lat.append(location.latitude)
lon.append(location.longitude)
rating = pd.DataFrame()
rating['location'] = loc
rating['lat'] = lat
rating['lon'] = lon
rating['avg_rating'] = avg_rating
rating.isna().sum()
location 0 lat 1 lon 1 avg_rating 0 dtype: int64
rating=rating.dropna()
HeatMap(rating[['lat','lon','avg_rating']],zoom = 20,radius = 15).add_to(basemap)
basemap
df2 = df[df['cuisines'] == 'North Indian']
df2.head()
| url | address | name | online_order | book_table | rate | votes | phone | location | rest_type | dish_liked | cuisines | cost_for_2 | reviews_list | menu_item | listed_type | city | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | https://www.zomato.com/bangalore/timepass-dinn... | 37, 5-1, 4th Floor, Bosco Court, Gandhi Bazaar... | Timepass Dinner | True | False | 3.8 | 286 | +91 9980040002\r\n+91 9980063005 | Basavanagudi | Casual Dining | Onion Rings, Pasta, Kadhai Paneer, Salads, Sal... | North Indian | 600.0 | [('Rated 3.0', 'RATED\n Food 3/5\nAmbience 3/... | [] | Buffet | Banashankari |
| 50 | https://www.zomato.com/bangalore/petoo-banasha... | 276, Ground Floor, 100 Feet Outer Ring Road, B... | Petoo | False | False | 3.7 | 21 | +91 8026893211 | Banashankari | Quick Bites | North Indian | 450.0 | [('Rated 2.0', 'RATED\n This is a neatly made... | [] | Delivery | Banashankari | |
| 84 | https://www.zomato.com/bangalore/krishna-sagar... | 38, 22nd Main, 22nd Cross, Opposite BDA, 2nd S... | Krishna Sagar | False | False | 3.5 | 31 | +91 8892752997\r\n+91 7204780429 | Banashankari | Quick Bites | North Indian | 200.0 | [('Rated 1.0', 'RATED\n Worst experience with... | [] | Delivery | Banashankari | |
| 88 | https://www.zomato.com/bangalore/nandhini-delu... | 304, Opposite Apollo Public School, 100 Feet R... | Nandhini Deluxe | False | False | 2.6 | 283 | 080 26890011\r\n080 26890033 | Banashankari | Casual Dining | Biryani, Chicken Guntur, Thali, Buttermilk, Ma... | North Indian | 600.0 | [('Rated 3.0', 'RATED\n Ididnt like much.\n\n... | [] | Delivery | Banashankari |
| 102 | https://www.zomato.com/bangalore/katriguppe-do... | 8, Katriguppe Main Road, Vivekananda Nagar, 3r... | Katriguppe Donne Biryani | False | False | 3.2 | 4 | +91 9964847091 | Banashankari | Quick Bites | North Indian | 300.0 | [] | [] | Delivery | Banashankari |
north_india = df2.groupby('location')['url'].count().reset_index()
north_india.columns = ['Name','count']
north_india.head()
| Name | count | |
|---|---|---|
| 0 | BTM | 241 |
| 1 | Banashankari | 27 |
| 2 | Banaswadi | 9 |
| 3 | Bannerghatta Road | 57 |
| 4 | Basavanagudi | 16 |
north_india = north_india.merge(locations, on = "Name", how = 'left').dropna()
basemap=generateBaseMap()
HeatMap(north_india[['lat','lon','count']].values.tolist(),zoom=20,radius=15).add_to(basemap)
basemap
def Heatmap_Zone(zone):
df3 = df[df['cuisines'] == zone]
df_zone = df3.groupby(['location'],as_index=False)['url'].agg('count')
df_zone.columns = ['Name','count']
df_zone = df_zone.merge(locations,on="Name",how='left').dropna()
basemap = generateBaseMap()
HeatMap(df_zone[['lat','lon','count']].values.tolist(),zoom=20,radius=15).add_to(basemap)
return basemap
df['cuisines'].unique()
array(['North Indian, Mughlai, Chinese', 'Chinese, North Indian, Thai',
'Cafe, Mexican, Italian', ..., 'Tibetan, Nepalese',
'North Indian, Street Food, Biryani',
'North Indian, Chinese, Arabian, Momos'], dtype=object)
Heatmap_Zone('South Indian')
Heatmap_Zone('Italian')
labels = ["Accepted",'Not Accepted']
values = df['online_order'].value_counts()
colors = ['pink', 'darkgreen']
fig = go.Figure(data=[go.Pie(labels=labels,
values=values,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
marker=dict(colors=colors, line=dict(color='#000000', width=3)))
fig.update_layout(title="Online delivering available? ",
titlefont={'size': 30},
)
fig.show()
labels = ["Accepted",'Not Accepted']
values = df['book_table'].value_counts()
colors = ['cyan', 'darkgreen']
fig = go.Figure(data=[go.Pie(labels=labels,
values=values,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
marker=dict(colors=colors, line=dict(color='#000000', width=3)))
fig.update_layout(title="Table booking available? ",
titlefont={'size': 30},
)
fig.show()
values = df['cuisines'].value_counts()[:20]
labels=values.index
text=values.index
fig = go.Figure(data=[go.Pie(values=values,labels=labels,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
marker=dict(line=dict(color='#000000', width=3)))
fig.update_layout(title="Most popular cuisines of Bangalore ",
titlefont={'size': 30},
)
fig.show()
fig = px.box(df, x = 'online_order', y = 'cost_for_2', color = 'online_order')
fig.update_layout(title = "Cost comparison for Online order",
titlefont={'size': 30},template = 'simple_white'
)
fig.show()
dfupd = df.copy()
dfupd['update_dish_liked'] = dfupd['dish_liked'].apply(lambda x : x.split(',') if type(x)==str else [''])
rest = dfupd['rest_type'].value_counts()[:9].index
def produce_wordcloud(rest):
plt.figure(figsize=(20,30))
for i,restaurant in enumerate(rest):
plt.subplot(3,3,i+1)
dishes=''
data=dfupd[dfupd['rest_type']==restaurant]
for word in data['dish_liked']:
words=word.split()
# Converts each token into lowercase
for i in range(len(words)):
words[i] = words[i].lower()
dishes=dishes+ " ".join(words)+" "
wordcloud = WordCloud(max_font_size=None, background_color='black', collocations=False,stopwords = stopwords,width=1200, height=1200).generate(dishes)
plt.imshow(wordcloud)
plt.title(restaurant)
plt.axis("off")
stopwords = set(STOPWORDS)
produce_wordcloud(rest)
def reviewwords(restaurant):
dataset=dfupd[dfupd['rest_type']==restaurant]
total_review=' '
for review in dataset['reviews_list']:
review=review.lower()
review=re.sub('[^a-zA-Z]', ' ',review)
review=re.sub('rated', ' ',review)
review=re.sub('x',' ',review)
review=re.sub(' +',' ',review)
total_review=total_review + str(review)
wordcloud = WordCloud(width = 800, height = 800,
background_color ='black',
stopwords = set(STOPWORDS),
min_font_size = 10).generate(total_review)
# plot the WordCloud image
plt.figure(figsize = (8, 8))
plt.imshow(wordcloud)
plt.axis("off")
reviewwords('Quick Bites')
reviewwords('Cafe')
reviewwords('Delivery')
fig=px.bar(x = df['city'].unique(),y = df['city'].value_counts(), labels = dict(x = "City Name", y = "Total Count"),color_continuous_scale = "Cividis", color = df['city'].unique())
fig.update_layout(title="Location wise counts for Restaurants ",
titlefont={'size': 30},template='simple_white'
)
fig.update_traces(marker_line_color='black',
marker_line_width=2, opacity=1)
fig.show()
loc_plt=pd.crosstab(df2['rate'],df2['city'])
fig=px.bar(loc_plt,x=loc_plt.index,y=loc_plt.columns,barmode='stack',opacity=1)
fig.update_layout(title="Location wise Rating",
titlefont={'size': 30},
template='simple_white'
)
fig.update_traces(marker_line_color='black',
marker_line_width=0.5, opacity=0.8)
fig.show()
fig=px.histogram(df['listed_type'], labels = dict(value = 'listed_type'))
fig.update_layout(title="Type of Services",
titlefont={'size': 30},template='simple_white'
)
fig.update_traces(marker_color='pink', marker_line_color='black',
marker_line_width=2, opacity=1)
fig.show()
fig = px.histogram(df['cost_for_2'], labels = dict(value = 'Cost Range'), nbins = 10)
fig.update_layout(title = "Cost of Restaurants",
titlefont = {'size': 30}, template = 'simple_white'
)
fig.update_traces(marker_color = 'cyan', marker_line_color = 'black',
marker_line_width = 2, opacity = 1)
fig.show()
chains = df['name'].value_counts()[:10]
fig = px.bar(y = chains, x = chains.index, labels = dict(x = 'Name', y = 'Count'),color_continuous_scale = "Agsunset", color = chains.index)
fig.update_layout(title = "Most famous restaurant chains",
titlefont = {'size': 30},template='simple_white'
)
fig.update_traces( marker_line_color = 'black',
marker_line_width = 2, opacity=1)
fig.show()
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
df.head()
| url | address | name | online_order | book_table | rate | votes | phone | location | rest_type | dish_liked | cuisines | cost_for_2 | reviews_list | menu_item | listed_type | city | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | https://www.zomato.com/bangalore/jalsa-banasha... | 942, 21st Main Road, 2nd Stage, Banashankari, ... | Jalsa | True | True | 4.1 | 775 | 080 42297555\r\n+91 9743772233 | Banashankari | Casual Dining | Pasta, Lunch Buffet, Masala Papad, Paneer Laja... | North Indian, Mughlai, Chinese | 800.0 | [('Rated 4.0', 'RATED\n A beautiful place to ... | [] | Buffet | Banashankari |
| 1 | https://www.zomato.com/bangalore/spice-elephan... | 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... | Spice Elephant | True | False | 4.1 | 787 | 080 41714161 | Banashankari | Casual Dining | Momos, Lunch Buffet, Chocolate Nirvana, Thai G... | Chinese, North Indian, Thai | 800.0 | [('Rated 4.0', 'RATED\n Had been here for din... | [] | Buffet | Banashankari |
| 2 | https://www.zomato.com/SanchurroBangalore?cont... | 1112, Next to KIMS Medical College, 17th Cross... | San Churro Cafe | True | False | 3.8 | 918 | +91 9663487993 | Banashankari | Cafe, Casual Dining | Churros, Cannelloni, Minestrone Soup, Hot Choc... | Cafe, Mexican, Italian | 800.0 | [('Rated 3.0', "RATED\n Ambience is not that ... | [] | Buffet | Banashankari |
| 3 | https://www.zomato.com/bangalore/addhuri-udupi... | 1st Floor, Annakuteera, 3rd Stage, Banashankar... | Addhuri Udupi Bhojana | False | False | 3.7 | 88 | +91 9620009302 | Banashankari | Quick Bites | Masala Dosa | South Indian, North Indian | 300.0 | [('Rated 4.0', "RATED\n Great food and proper... | [] | Buffet | Banashankari |
| 4 | https://www.zomato.com/bangalore/grand-village... | 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... | Grand Village | False | False | 3.8 | 166 | +91 8026612447\r\n+91 9901210005 | Basavanagudi | Casual Dining | Panipuri, Gol Gappe | North Indian, Rajasthani | 600.0 | [('Rated 4.0', 'RATED\n Very good restaurant ... | [] | Buffet | Banashankari |
The code is logical and perfectly runnable, but through this it is creating 7777 columns (this is because there is review sentences present in the column). If I remove the rows having the review sentences I will be losing much precious data. So for now I am not running the code and if I run the system is taking lot of time in running below cells and needs more computational power. That is why I am not including it.
cuisines = df['cuisines']
cuisines_split = cuisines.str.split(',', expand=True)
cuisines_split.columns = [f"{i}" for i in range(cuisines_split.shape[1])]
cuisines_encoded = pd.get_dummies(cuisines_split)
df_encoded1 = pd.concat([df, cuisines_encoded], axis=1)
df_encoded1['count_dish_liked'] = 0
for index, row in df_encoded1.iterrows():
if isinstance(row['dish_liked'], float):
# If cuisines column contains NaN values, set cuisine count to 0
dish_liked_count = 0
else:
dish_liked_list = row['dish_liked'].split(',')
dish_liked_count = len(dish_liked_list)
df_encoded1.at[index, 'count_dish_liked'] = dish_liked_count
df_encoded1['count_rest_type'] = 0
for index, row in df_encoded1.iterrows():
if isinstance(row['rest_type'], float):
# If cuisines column contains NaN values, set cuisine count to 0
rest_type_count = 0
else:
rest_type_list = row['rest_type'].split(',')
rest_type_count = len(rest_type_list)
df_encoded1.at[index, 'count_rest_type'] = rest_type_count
df_encoded1
| url | address | name | online_order | book_table | rate | votes | phone | location | rest_type | ... | 7_ Kerala | 7_ North Indian | 7_ Pizza | 7_ Rolls | 7_ Salad | 7_ Seafood | 7_ South Indian | 7_ Thai | count_dish_liked | count_rest_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | https://www.zomato.com/bangalore/jalsa-banasha... | 942, 21st Main Road, 2nd Stage, Banashankari, ... | Jalsa | True | True | 4.1 | 775 | 080 42297555\r\n+91 9743772233 | Banashankari | Casual Dining | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 1 |
| 1 | https://www.zomato.com/bangalore/spice-elephan... | 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... | Spice Elephant | True | False | 4.1 | 787 | 080 41714161 | Banashankari | Casual Dining | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 1 |
| 2 | https://www.zomato.com/SanchurroBangalore?cont... | 1112, Next to KIMS Medical College, 17th Cross... | San Churro Cafe | True | False | 3.8 | 918 | +91 9663487993 | Banashankari | Cafe, Casual Dining | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 2 |
| 3 | https://www.zomato.com/bangalore/addhuri-udupi... | 1st Floor, Annakuteera, 3rd Stage, Banashankar... | Addhuri Udupi Bhojana | False | False | 3.7 | 88 | +91 9620009302 | Banashankari | Quick Bites | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 4 | https://www.zomato.com/bangalore/grand-village... | 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... | Grand Village | False | False | 3.8 | 166 | +91 8026612447\r\n+91 9901210005 | Basavanagudi | Casual Dining | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 41660 | https://www.zomato.com/bangalore/the-farm-hous... | 136, SAP Labs India, KIADB Export Promotion In... | The Farm House Bar n Grill | False | False | 3.7 | 34 | +91 9980121279\n+91 9900240646 | Whitefield | Casual Dining, Bar | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 |
| 41661 | https://www.zomato.com/bangalore/bhagini-2-whi... | 139/C1, Next To GR Tech Park, Pattandur Agraha... | Bhagini | False | False | 2.5 | 81 | 080 65951222 | Whitefield | Casual Dining, Bar | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 |
| 41662 | https://www.zomato.com/bangalore/best-brews-fo... | Four Points by Sheraton Bengaluru, 43/3, White... | Best Brews - Four Points by Sheraton Bengaluru... | False | False | 3.6 | 27 | 080 40301477 | Whitefield | Bar | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 41663 | https://www.zomato.com/bangalore/chime-sherato... | Sheraton Grand Bengaluru Whitefield Hotel & Co... | Chime - Sheraton Grand Bengaluru Whitefield Ho... | False | True | 4.3 | 236 | 080 49652769 | ITPL Main Road, Whitefield | Bar | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
| 41664 | https://www.zomato.com/bangalore/the-nest-the-... | ITPL Main Road, KIADB Export Promotion Industr... | The Nest - The Den Bengaluru | False | False | 3.4 | 13 | +91 8071117272 | ITPL Main Road, Whitefield | Bar, Casual Dining | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 |
41418 rows × 496 columns
df_encoded1.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 41418 entries, 0 to 41664 Columns: 496 entries, url to count_rest_type dtypes: bool(2), float64(2), int64(3), object(12), uint8(477) memory usage: 25.6+ MB
df_encoded1.to_csv('Revised_Zomato.csv', index = False)
data = df_encoded1
data.drop(['url', 'address', 'name', 'phone', 'menu_item','rest_type', 'cuisines', 'dish_liked', 'reviews_list'], axis=1, inplace=True)
def convert_string_to_num(col):
values = col.unique()
key = {}
for i, val in enumerate(values):
key[val] = i
col = col.map(key)
return col
data['online_order'] = convert_string_to_num(data['online_order'])
data['book_table'] = convert_string_to_num(data['book_table'])
data['location'] = convert_string_to_num(data['location'])
data['listed_type'] = convert_string_to_num(data['listed_type'])
data['city'] = convert_string_to_num(data['city'])
data.head()
| online_order | book_table | rate | votes | location | cost_for_2 | listed_type | city | 0_ | 0_African | ... | 7_ Kerala | 7_ North Indian | 7_ Pizza | 7_ Rolls | 7_ Salad | 7_ Seafood | 7_ South Indian | 7_ Thai | count_dish_liked | count_rest_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 4.1 | 775 | 0 | 800.0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 1 |
| 1 | 0 | 1 | 4.1 | 787 | 0 | 800.0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 1 |
| 2 | 0 | 1 | 3.8 | 918 | 0 | 800.0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 2 |
| 3 | 1 | 1 | 3.7 | 88 | 0 | 300.0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 4 | 1 | 1 | 3.8 | 166 | 1 | 600.0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 |
5 rows × 487 columns
# Split data into training and testing sets
X = data.drop(['rate'], axis=1)
y = data['rate']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
# Train the model
rf = RandomForestRegressor(n_estimators = 100, random_state = 42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error: {:.2f}".format(mse))
print("R2 Score: {:.2f}".format(r2))
Mean Squared Error: 0.02 R2 Score: 0.92
plt.scatter(rf.predict(X_train), rf.predict(X_train) - y_train,
color = "green", s = 10, label = "Train data")
plt.scatter(rf.predict(X_test), rf.predict(X_test) - y_test,
color = "blue", s = 10, label = "Test data")
plt.legend(loc = "upper right")
plt.title("Residual errors")
plt.show()
residuals = y_test - y_pred
plt.scatter(y_pred, residuals)
plt.xlabel('Predicted Ratings')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Ratings')
plt.ylabel('Predicted Ratings')
plt.title('Actual vs Predicted Ratings')
plt.show()
residuals = y_test - y_pred
plt.hist(residuals, bins=30)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Histogram of Residuals')
plt.show()
importances = pd.Series(rf.feature_importances_, index=X.columns)
importances.nlargest(10).plot(kind='barh')
plt.title('Feature Importances')
plt.xlabel('Importance')
plt.ylabel('Features')
plt.show()
data.corr()
| online_order | book_table | rate | votes | location | cost_for_2 | listed_type | city | 0_ | 0_African | ... | 7_ Kerala | 7_ North Indian | 7_ Pizza | 7_ Rolls | 7_ Salad | 7_ Seafood | 7_ South Indian | 7_ Thai | count_dish_liked | count_rest_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| online_order | 1.000000 | -0.054771 | -0.069354 | 0.013319 | 0.049634 | -0.179486 | 0.239442 | 0.054101 | -0.010046 | -0.012807 | ... | -0.013290 | 0.019231 | -0.004991 | -0.005023 | 0.012428 | -0.007103 | -0.011232 | 0.009615 | -0.089042 | 0.057937 |
| book_table | -0.054771 | 1.000000 | -0.426095 | -0.393434 | -0.032901 | 0.266558 | -0.114141 | -0.029076 | 0.005889 | -0.011464 | ... | 0.007791 | -0.008621 | -0.025116 | 0.002944 | -0.028408 | -0.023195 | 0.006585 | -0.016401 | -0.441302 | -0.225198 |
| rate | -0.069354 | -0.426095 | 1.000000 | 0.434764 | 0.032485 | -0.115575 | 0.033588 | 0.024561 | -0.008686 | 0.035868 | ... | -0.005678 | 0.003932 | 0.025919 | 0.009457 | 0.013649 | 0.011144 | -0.003539 | 0.006303 | 0.600934 | 0.174914 |
| votes | 0.013319 | -0.393434 | 0.434764 | 1.000000 | 0.007221 | -0.116102 | 0.070343 | 0.026530 | -0.005351 | 0.002553 | ... | 0.001125 | -0.003938 | 0.003565 | -0.001002 | 0.005231 | 0.018018 | -0.004749 | 0.007788 | 0.438448 | 0.201077 |
| location | 0.049634 | -0.032901 | 0.032485 | 0.007221 | 1.000000 | -0.019359 | 0.040580 | 0.359206 | 0.000783 | -0.011333 | ... | 0.010554 | -0.014721 | -0.004663 | -0.004576 | -0.015641 | -0.008894 | -0.000274 | -0.000465 | 0.027271 | 0.021978 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7_ Seafood | -0.007103 | -0.023195 | 0.011144 | 0.018018 | -0.008894 | -0.014954 | -0.010982 | -0.009201 | -0.000137 | -0.000174 | ... | -0.000181 | -0.000137 | -0.000181 | -0.000068 | -0.000118 | 1.000000 | -0.000153 | -0.000068 | 0.012592 | -0.004200 |
| 7_ South Indian | -0.011232 | 0.006585 | -0.003539 | -0.004749 | -0.000274 | 0.001945 | -0.000094 | 0.002178 | -0.000216 | -0.000275 | ... | -0.000286 | -0.000216 | -0.000286 | -0.000108 | -0.000187 | -0.000153 | 1.000000 | -0.000108 | -0.002900 | -0.006641 |
| 7_ Thai | 0.009615 | -0.016401 | 0.006303 | 0.007788 | -0.000465 | -0.010559 | 0.010058 | -0.003027 | -0.000097 | -0.000123 | ... | -0.000128 | -0.000097 | -0.000128 | -0.000048 | -0.000084 | -0.000068 | -0.000108 | 1.000000 | 0.008904 | -0.002969 |
| count_dish_liked | -0.089042 | -0.441302 | 0.600934 | 0.438448 | 0.027271 | 0.012874 | 0.046316 | 0.013475 | -0.012795 | 0.022703 | ... | 0.023560 | -0.001319 | 0.023560 | 0.008904 | 0.015422 | 0.012592 | -0.002900 | 0.008904 | 1.000000 | 0.153120 |
| count_rest_type | 0.057937 | -0.225198 | 0.174914 | 0.201077 | 0.021978 | -0.164643 | 0.065063 | 0.028248 | -0.005939 | -0.007572 | ... | -0.007858 | -0.005939 | 0.035763 | -0.002969 | -0.005143 | -0.004200 | -0.006641 | -0.002969 | 0.153120 | 1.000000 |
487 rows × 487 columns
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
# Scale numerical features
scaler = StandardScaler()
data[['cost_for_2', 'votes']] = scaler.fit_transform(data[['cost_for_2', 'votes']])
# Split the data into training and testing sets
X = data.drop(['rate'], axis=1)
y = data['rate']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)
# Define the model
model = Sequential()
model.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 128) 62336
dropout (Dropout) (None, 128) 0
dense_1 (Dense) (None, 64) 8256
dropout_1 (Dropout) (None, 64) 0
dense_2 (Dense) (None, 32) 2080
dropout_2 (Dropout) (None, 32) 0
dense_3 (Dense) (None, 1) 33
=================================================================
Total params: 72,705
Trainable params: 72,705
Non-trainable params: 0
_________________________________________________________________
# Compile the model
model.compile(loss='mean_squared_error', optimizer=Adam(learning_rate=0.01))
# Define early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=5)
# Fit the model
history = model.fit(X_train, y_train, epochs=20, batch_size=64, validation_data=(X_test, y_test), callbacks=[early_stopping])
Epoch 1/20 453/453 [==============================] - 6s 10ms/step - loss: 0.6735 - val_loss: 0.1122 Epoch 2/20 453/453 [==============================] - 4s 9ms/step - loss: 0.2021 - val_loss: 0.1079 Epoch 3/20 453/453 [==============================] - 4s 9ms/step - loss: 0.1588 - val_loss: 0.1048 Epoch 4/20 453/453 [==============================] - 5s 10ms/step - loss: 0.1302 - val_loss: 0.1068 Epoch 5/20 453/453 [==============================] - 4s 9ms/step - loss: 0.1096 - val_loss: 0.0971 Epoch 6/20 453/453 [==============================] - 4s 9ms/step - loss: 0.0995 - val_loss: 0.0952 Epoch 7/20 453/453 [==============================] - 4s 10ms/step - loss: 0.0954 - val_loss: 0.0973 Epoch 8/20 453/453 [==============================] - 4s 10ms/step - loss: 0.0928 - val_loss: 0.0924 Epoch 9/20 453/453 [==============================] - 6s 14ms/step - loss: 0.0925 - val_loss: 0.0938 Epoch 10/20 453/453 [==============================] - 4s 10ms/step - loss: 0.0913 - val_loss: 0.0887 Epoch 11/20 453/453 [==============================] - 7s 15ms/step - loss: 0.0912 - val_loss: 0.0880 Epoch 12/20 453/453 [==============================] - 4s 9ms/step - loss: 0.0908 - val_loss: 0.0904 Epoch 13/20 453/453 [==============================] - 5s 11ms/step - loss: 0.0906 - val_loss: 0.0925 Epoch 14/20 453/453 [==============================] - 4s 9ms/step - loss: 0.0899 - val_loss: 0.0863 Epoch 15/20 453/453 [==============================] - 4s 9ms/step - loss: 0.0912 - val_loss: 0.0905 Epoch 16/20 453/453 [==============================] - 4s 9ms/step - loss: 0.0903 - val_loss: 0.0920 Epoch 17/20 453/453 [==============================] - 5s 10ms/step - loss: 0.0911 - val_loss: 0.0896 Epoch 18/20 453/453 [==============================] - 4s 9ms/step - loss: 0.0909 - val_loss: 0.0899 Epoch 19/20 453/453 [==============================] - 4s 9ms/step - loss: 0.0922 - val_loss: 0.0938
# Plot the evaluation graph
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Evaluation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
df = pd.read_csv('zomato.csv', encoding='ISO-8859-1')
df['approx_cost(for two people)'] = df['approx_cost(for two people)'].astype(str)
df['approx_cost(for two people)'] = df['approx_cost(for two people)'].apply(lambda x: x.replace(',','.'))
df['approx_cost(for two people)'] = df['approx_cost(for two people)'].astype(float)
def get_user_input():
cuisine_type = input("What type of cuisine are you in the mood for? ")
location = input("Where are you located? ")
cost = float(input("What is your budget for two people? "))
try:
cost = float(cost)
except ValueError:
print("Invalid budget input. Defaulting to 500.")
cost = 500
return cuisine_type, location, cost
def filter_restaurants(cuisine_type, location, cost):
# Filter the restaurants based on cuisine type, location, and cost
filtered_data = df[(df['cuisines'].str.contains(cuisine_type, na=False)) &
(df['location'].str.contains(location, na=False)) &
(df['approx_cost(for two people)'] <= cost)]
return filtered_data
def calculate_similarity(vec1, vec2):
# Calculate the cosine similarity between two vectors
return cosine_similarity(vec1.reshape(1, -1), vec2.reshape(1, -1))[0][0]
def recommend_restaurants(cuisine_type, location, cost):
# Filter the restaurants
filtered_data = filter_restaurants(cuisine_type, location, cost)
# Calculate the user input vector
user_input_vec = np.zeros(len(filtered_data.columns) - 1)
user_input_vec[-1] = cost
# Calculate the similarity between the user input vector and each restaurant vector
similarities = []
for i in range(len(filtered_data)):
restaurant_vec = np.zeros(len(filtered_data.columns) - 1)
restaurant_cuisine_types = filtered_data.iloc[i]['cuisines'].split(', ')
for cuisine_type in restaurant_cuisine_types:
if cuisine_type in filtered_data.columns:
cuisine_type_index = np.where(filtered_data.columns==cuisine_type)[0][0]
restaurant_vec[cuisine_type_index] = 1
restaurant_cost = filtered_data.iloc[i]['approx_cost(for two people)']
restaurant_vec[-1] = restaurant_cost
similarity = calculate_similarity(user_input_vec[1:], restaurant_vec[1:])
similarities.append((i, similarity))
# Sort the restaurants based on similarity and return the top 10 recommendations
similarities.sort(key=lambda x: x[1], reverse=True)
recommendations = filtered_data.iloc[[x[0] for x in similarities[:10]]]
return recommendations[['name', 'location', 'cuisines', 'rate']]
# Get user input and recommend restaurants
cuisine_type, location, cost = get_user_input()
recommendations = recommend_restaurants(cuisine_type, location, cost)
print('Recommended restaurants:')
print(recommendations)
What type of cuisine are you in the mood for? North Indian
Where are you located? BTM
What is your budget for two people? 500
Recommended restaurants:
name location \
922 eat.fit BTM
928 Hiyar Majhe Kolkata BTM
932 Sri Lakshmi Dhaba BTM
934 Swadista Aahar BTM
940 Swad Punjab Da BTM
942 Roti Wala BTM
946 Apna Punjab BTM
947 Paratha Junction BTM
952 Kullad Cafe BTM
954 Litti Twist BTM
cuisines rate
922 Healthy Food, North Indian, Biryani, Continent... 4.5/5
928 Bengali, North Indian 4.0/5
932 North Indian 2.9/5
934 South Indian, North Indian, Chinese, Street Food 4.1/5
940 North Indian 4.0/5
942 North Indian 4.0/5
946 North Indian, Chinese, Fast Food 3.6/5
947 North Indian, Chinese 2.9/5
952 North Indian, Cafe, Fast Food, Beverages 3.9/5
954 North Indian, Bihari 4.1/5